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Abstract 

Abstract —We present a new circuit for non-Boolean recognition of binary images. 
Employing all-spin logic (ASL) devices, we design logic comparators and non-Boolean 
decision blocks for compact and efficient computation. By manipulation of fan-in 
number in different stages of the circuit, the structure can be extended for larger 
training sets or larger images. Operating based on the mainly similarity idea, the 
system is capable of constructing a mean image and compare it with a separate input 
image within a short decision time. Taking advantage of the non-volatility of ASL 
devices, the proposed circuit is capable of hybrid memory/logic operation. Compared 
with existing CMOS pattern recognition circuits, this work achieves a smaller footprint, 
lower power consumption, faster decision time and a lower operational voltage. 


1 Introduction 

Pattern recognition and in particular, image recognition techniques have been widely stud¬ 
ied in machine learning and image processing Hardware demonstration of computa¬ 

tion units for pattern recognition; however, has consistently been a challenging problem in 
terms of chip size, power consumption, computation complexity and decision speed. 

Among different solid state technologies, CMOS provides the chance of low cost, highly- 
integrated low power implementation for pattern recognition |4]-[^ and processing 
systems. For boolean logic systems, CMOS gates exhibit processing speeds up to a few GHz 
and can be designed to have a low static power. However, the dynamic power consumption 
of a large system with a GHz clock frequency can still limit the scalability. Fan-in and fan¬ 
out considerations for CMOS devices also impact the speed, power consumption and the 
size of devices. Besides boolean systems, some novel non-Boolean techniques have been 
developed to overcome these issues. In non-Boolean systems, logic gates will no longer 
be the key block and analog/mixed signal circuits are used. In [^, the authors propose a 
technique for non-Boolean training and detection of image pixels using a network of coupled 
oscillators. This structure has the capability to detect any scaled or rotated version of a 
desired image. On the other hand, this method suffers from high computational complexity, 
large area and high power consumption which limit the application for large image arrays. 
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To this, we should also add the long convergence time. Other proposed CMOS systems have 
demonstrated artificial neural networks (ANN) by designing circuits emulating neurons and 
synapses [^[^. In these systems, the larger computation burden, leaves the search open 
for new solutions. 

To overcome the limitations of CMOS devices, other technologies are being investigated 
for pattern recognition applications. Spintronic devices, in particular, have received a lot of 
attention recently because of some unique properties, e.g., low voltage operation and non¬ 
volatility. In [^, a non-volatile logic-in-memory full adder is fabricated using the magnetic 
tunnel junctions (MTJ’s). The proposed architecture is compared with an 0.18/im CMOS 
process counterpart and exhibits major advantages. The dynamic power consumption 
compared to a conventional CMOS circuit is 23% reduced due to reduction in the number 
of paths from Vdd to GND. On the other hand the static power consumption is eliminated 
due to the non-volatility and the chip area is also reduced. As stated in 11 , the ASL devices 
are also non-volatile and the conputational state is preserved when the power to the circuit 
is turned off. In 12 , a spin-based artificial neural network (ANN) is proposed using lateral 
spin valves to achieve a low power consumption and a low operatoinal voltage. In 
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spin swithces to develop compact neurons and synapses are proposed. In 14 , all-spin logic 
(ASL) and charge-spin logic (CSL) devices are shown to be capable of Boolean and non- 
Boolean operations which make them an attractive choice to build some fundamental blocks 
such as ring oscillators. Recently, the design of ASL gates with graphene channels have 
been proposed [^. Due to the unique features of graphene in terms of the spin transport, 
the design of Boolean and non-Boolean computation units with these new devices can be 
investigated as a future direction. Moreover, majority gate operation of ASL devices has 
been previously introduced in some Boolean logic systems 19,^. This unique feature 


of these devices can overcome fan-in and fan-out limitations of large integrated systems. 
Besides, the inverting and non-inverting operation modes of ASL devices can be the key 
to design many logic circuits e.g., full adder circuits and multipliers [^. The time domain 
transient behavior of magnetization in these devices can also provide another degree of 
freedom to demonstrate non-Boolean operations as it will be discussed later in this paper. 
These features, enable us to design an all-spin logic non-Boolean compact structure with 
low power consumption and low computational complexity. 

In this paper we propose a novel pattern recognition circuit that takes advantage of 
most novel features of spintronic devices such as non-volatility, efficient implementation of 
majority gates and XOR functions and the ability to distinguish strong and weak majorities. 
Non-volatility of the devices enables storing large sets of training images within the logic 
with no standby power dissipation. This feature also enables instant-on operation and saves 
on energy and delay penalties imposed by loading training images form a main memory. 

The rest of this paper is organized as follows. Section II describes the operation of 
ASL devices. The proposed approach and the basic of computation are given in Section 
III. In Section IV, the proposed architecture and a comprehensive discussion on design 
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considerations are presented. Simulation results and summary are shown in Sections V 
and VI, respectively. 


2 All-Spin Logic Devices 


Spin of electron is introduced as a new state variable in spintronic devices to process and 
store information. This new alternative to charge-based systems, provides the possibility 
of achieving an ultra-low voltage operation and easier demonstration of digital systems 
coming from the bistable nature of spin [T^ . 

In all-spin logic devices, input and output data are represented by the magnetization of 
two ferromagnets which are communicating through a spin-coherent metallic channel. 
The physical view of these devices is shown in Fig 1(a). As shown in the Fig. l(b-d) and 
discussed in [^, the applied voltage on the input ferromagnet, creates a flow of electrons 
which moves them from the supply voltage to the ground. This flow of electrons, becomes 
spin-polarized when passing through the input ferromagnet. Since the concentration of the 
spin-polarized electrons are different at the input side and the output side of the channel, 
the electrons diffuse to the output side. The accumulated spin-polarized electrons under the 
output ferromagnet, can switch the magnetization orientation of the magnet by applying 
a torque based on spin-transfer torque effect. 

As shown in 19 , these devices can be concatenable, exhibit nonlinear characteristics 


and support all Boolean operations. In all-spin logic operation, by using the direct spin 
signal, the nanomagnet can be switched and this signal can be transferred to the next 
stage. By storing the information in the spin magnetization of magnets, the input and out¬ 
put magnets can effectively be considered as digital capacitors linked by a spin-coherent 
channel. The sign and magnitude of control voltages applied on the magnets, determines 
the polarity of majority spin electrons and the device speed, respectively. Any change of 
magnetization in the bistable input magnet can exert a spin current through the channel 
and this current can determine the spin magnetization of the output magne t [T7| . The 
channel between the two magnets can be either a metal or a semiconductor |25|. In our 
modeling and simulations we assume a copper interconnect. 


The models utilized in this work are based on [Tt] where the different physical effects 
are captured. The accurate parameters of channel, magnet and interface that determine 
different performance characteristics, e.g., the spin injection, detection and transport effi¬ 
ciency are taken into account. The main important size effect parameters for the purpose 
of this work are the side wall specularity, the grain boundary reflectivity and the average 
grain size [T^. The average grain size is assumed to be equal to the width of thickness of 
the metals [1^ . The complete list of parameters is in Appendix B. 
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Figure 1: (a) Configuration of single ASL device (b) Applied voltage on the magnet, 
creates an electric field and enforces electron movements, (c) Spin-polarized electrons at 
the input side exhibit a higher density compared to the output side, (d) The diffusion 
of spin-polarized electrons towards the output magnet, changes the output magnetization 
direction, (e) An ASL Majority Gate with 3 inputs [^. The 3 input magnets. Ml, M2 
and M3 are connected to the output magnet MO using 3 metallic interconnects. 

2.1 Majority Gate Operation 

As mentioned earlier, the ASL device supports a majority operation as shown in Fig 1(e). 
This feature is achieved because the net spin current to the output magnet can be deter¬ 
mined by the sum of all input spin currents from all the input devices. In principle, this 
system can be designed for large number of inputs. As a trade off, by increasing the number 
of input devices in a majority gate, the uncorrelated thermal noise of these devices add 
up and impact the transient magnetization of output magnet, thus we need to make sure 
that in the design we have the proper fan-in. As it will be discussed later, if we only want 
to monitor the final steady value of the output magnetization, we can keep increasing the 
number of input devices as far as the output magnetization is predictable. Based on the 
device properties, this phenomenon sets a practical limit on the number of input devices to 
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a majority gate. On the other hand, if we care about the transient behavior of the output 
magnetization, fewer inputs should be connected to the output magnet to avoid the noise 
accumulation of input devices. In our simulations, for 3 and 5 input cases, the transient 
output magnetization is less impacted by the thermal noise, compared to higher fan-in 
numbers. We have to clarify that the steady state value of the majority gate depends on 
the sign of applied voltage on the magnets. In case of having a negative voltage applied on 
the magnets, the magnetization orientation value will finally be the exact majority of the 
input magnetizations. However, if the applied voltage is positive, the steady state value of 
the output magnet will be the complementary majority of the input magnetizations. 

The interesting phenomenon in ASL majority gates rises from the dependency of the 
transient behavior of the output magnetization on the number of similar input magnetiza¬ 
tions. This effect can be validated by the fact that the transferred spin torque increases 
when there are more magnets with magnetization in the same direction. Fig. 2, shows 
the different scenarios of transient output magnetization in majority gates with 3 and 5 
inputs. As shown in Fig. 2(a), in a majority gate, with 5 inputs, the switching of output 
magnetization becomes faster when there are more inputs with similar magnetization di¬ 
rections. As the number of magnets with similar magnetization decreases, the switching 
happens slower and the effect of thermal noise is sensed more. In Fig. 2(b), the switching 
transition for two majority gates with 3 inputs and 5 inputs are compared. By considering 
the fact that the thermal noise accumulation in the gate with 3 inputs is less compared to 
the gate with 5 inputs, in the case of having equal net spin currents to both gates, the gate 
with 3 inputs, exhibits more deterministic transition. 


2.2 Switching Delay Variation 


^ using the small cone-angle ap- 


( 1 ) 


The switching time of a ferromagnet is calculated in 
proximation 

ln{'K/6o) 

-,5 

X- 1 

where tq is a fitting parameter, is the initial angle of the magnet, and y is the ratio of 
the magnitude of injected spin current to the critical spin current required for the switching 
of the magnetization of the output ferromagent. Based on this equation, if the value of 
injected spin current increases, the switching delay decreases. However, as shown in 17 


the channel in this device can be approximated as an RC network; hence, the injected spin 
current and the value of supply voltage are directly correlated. Therefore, the switching 
delay is inversely proportional to the value of supply voltage. This result is shown in Fig. 
3. The device parameters used for these simulations are shown in Appendix B. 
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(b) 


Figure 2: (a) Switching transient response for different scenarios of input magnetization in 
a majority gate with 5 inputs, (b) Switching transition comparison of majority gates with 
3 and 5 inputs. In this comparison, the input magnetization of magnets to the 3 input gate 
are all similar. For the gate with 5 inputs, 4 inputs have similar magnetization and the 
net spin current is equal to the other gate. The applied voltage on the magntes in these 
simulations is —5 mV. 

2.3 Impact of Thermal Noise 

The thermal motion of electrons inside the ferromagents, is the main cause of thermal 
noise. The amount of noise is correlated to the temperature of magnets and can directly 


6 




























































5 10 15 20 25 30 35 40 45 50 55 

Supply Voltage(mV) 


Figure 3: Switching delay variation versus the supply voltage. Each point is simulated 3 
times to verify the results, 


affect the steady-state precession angel Oq. Based on the derivations in 



< >= 


KbT 

Eb ■ 


( 2 ) 


In this equation, £”5 is the barrier energy and T represents the temperature. This thermal 
effect acts as the main source of noise which can impact the output magnetization. In 
our simulations, Oq can differ 5% to 10% from the analytic solution based on different 
parameters. 


3 Pattern Recognition Scheme 

Similar to any recognition system, in this work we consider two major phases for the 
operation. The first phase is the “learning” phase where the desired pattern is stored in 
memory. In “detection” phase, the circuit identifies the similarity of an input data and 
the stored pattern with respect to the decision making criteria. In the learning phase, 
the circuit can receive a single image or a training set. The training set includes multiple 
training images from different users. 

In this section, we propose a new technique using all-spin logic devices and establish a 
fully spin-based operation. By illustrating several examples, we verify the performance for 
various image sizes. 
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3.1 Mainly Similar Images 

We first provide the mathematical definition of mainly similarity and then show how this 
can help the training of the circuit. In our simulations, all the images are binary-valued 
matrices with 0 and 1 representing white and black pixels, respectively. In our circuit, we 
assume that binary “0” logic corresponds to magnetization orientation in —X direction 
and binary “1” logic corresponds to magnetization orientation in -\-X direction. 

For a given pair of binary vectors x and y with equal length L, the Hamming distance 
is defined as 

L 

dix,y) = 

i=l 

where Xi and yi denote the i^^ components of x and y respectively and 5 is the kronecker 
delta. Subsequently, we can exploit this quantity as a measure of similarity between two 
images. 

Definition 1 Two binary images B and B' C {0, are called mainly similar if 
the majority of pixels across every two rows are identical. More specifically, 

n 

(3) 

where B^ . denotes the row of B and [aj represents the floor operation on a (i.e., the 
largest integer not greater than a). 

By this comparison, we ensure that the two images have almost similar pixels along the 
corresponding rows. For the purpose of this paper, we considered the comparison along the 
rows, although a column-wise comparison could be established with no loss of generality. 
As illustrated in Fig. 4, being mainly similar along the rows, does not imply being similar 
along the columns. 


3.2 Majority Training and Decision Making 

In the learning phase, we train the circuit by providing a number of mainly similar images. 
In reality, these images could be different representations of a target image (say a character 
or a certain binary pattern). We build up a representative of the given similar images by 
constructing a so-called mean image. 

Definition 2 For a set of P binary images Si,S 2 ,-- - ,Bp C {0,1}’^^'^, the corre¬ 
sponding mean image denoted as B is a binary image with entries 

1 P 

B{i,j) = nint{— Y, Bk{i,j)). 

^ k=i 


( 4 ) 



Figure 4: The two images are mainly similar (along the rows), however, the Hamming 
distance between the third columns is 3 which does not imply a similarity along the columns 


In this equation, nint denotes the nearest integer function. In our circuit, the mean image 
represents the desired pattern by the users and is utilized as a reference. Since this matrix 
is constructed using all-spin majority gates, the number of training images, P, is considered 
to be odd and upper bounded by the maximum number of inputs to a majority gate as 
discussed in subsection 12.11 

After the training data is stored and the mean image is constructed, we make a row¬ 
wise comparison between the input and the mean image. As we will see in the next section, 
depending on the initial value of output magnetization, the non-Boolean row decision maker 
can return the total count of matches or mismatches between the compared rows of input 
image and the mean image. 

4 Proposed Structure and Design Considerations 

Based on the pattern recognition scheme shown in Section III, we study two different 
implementations of the circuit. By comparing the performances of the two different ver¬ 
sions of the single pixel eomparator unit, we choose the one with more capabilities, at 
the expense of slightly more power consumption and occupied area. In the single pixel 
comparator, the circuit receives the training pixels from P different users and the mean 
image is constructed, subsequently. The value of the mean pixel is then compared with 
the corresponding value in the input image and the steady state magnetization of Pixel 
magnet stores this information. The two versions of this unit both operate based on the 
idea of training the circuit with a set of mainly similar images and comparison of the single 
pixels from the input image with their correspondence in the mean image. With respect to 
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the required operations, the single pixel comparator, needs a memory to store the training 
data, a logic comparator and a circuit to construct the mean pixel. As previously men¬ 
tioned, the mean pixel can be constructed by an all-spin majority gate; however, for the 
memory and the comparator, we will propose a new circuit in the following subsection. 


4.1 Memory+Logic Comparator 


1-bit full adder structures with a total number of 5 nanomagnets have been designed in 
and 24 . By proper setting of the circuit in (^, we use it as an area and power 


efficinet comparator (XNOR) block as shown in Fig 5. The two inputs to this block (A 
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Figure 5: 1-bit full adder used as XN0R[^. In the 2D implementation of this work, X 
and Y wires are in-plane metal wires and connections along the Z axis are vias. 

and B) are coming from distinct sources. One of the inputs comes from the input image 
synchronized with the control voltage and the other input is given to the circuit during the 
learning phase. Compared to a CMOS counterpart, this structure exhibits very important 
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advantages. First, it requires 5 magnets whereas the CMOS version requires at least 8 
transistors for XNOR implementation. Second, this circuit has the capability of storing 
the training information without extra static power consumption, whereas in CMOS, excess 
power is consumed to store this data [^. Taking advantage of the non-volatile operation 
in ASL devices, the input magnets of this circuit can store the binary data and later the 
stored information is used to determine the magnetization direction of the next stages. Fig. 
6 shows the simulated output waveform {sum magnet) of the XNOR block for different 
scenarios of input magnetization. As it is important to consider the breakdown current 
effects 23 , we choose the 5mV supply voltage in our simulations. This is to ensure that the 
current density is safely below the breakdown value. It is noteworthy that for channels with 
higher breakdown current densities, higher voltages can be applied and the operation speed 
increases. In this simulation, a control voltage of 5mV is applied on the magnets at t = 0. 
The total power consumption of the XNOR gate is 11/iVF and the estimated area is less 
than 0.3/im^. As we apply a control voltage on the XNOR gate, the output magnetization 
remains in —X orientation (initial condition of magnetization in this simulation) if the pixel 
values are different. In case of having similar inputs to this gate, the output magnetization 
switches to +X direction as shown in Fig. 6. We have to clarify that the initial condition 
of output magnet does not change the final magnetization orientation. 



Figure 6: Simulated output waveforms of XNOR gate 


4.2 Construction of the mean pixel 

As a reliable and simple way to extract the information from the training set, we construct 
the mean image as discussed in Section III. The ASL majority gate with the schematic 
shown in Fig. 1(b), provides a low power and efficient implementation of the mean image. 
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The inputs to this majority gate, come from P different users and the images that system 
receives during the learning phase are constrained to be mainly similar along the rows. By 
applying the control voltage on the magnets of this gate, the output magnetization either 
switches to other value or remains in the same magnetization orientation. If the applied 
control voltage is negative, the output final magnetization orientation is the majority of 
the input magnetizations. In the case of positive control voltage, the output magnetization 
settles to the complementary majority value of the input magnetizations. For this system, 
since we apply unified positive voltages, the majority gates settle to the complementary 
majority value. In order to extract more information from the majority gates operation in 
this circuit, we assume a unified value of initial magnetization orientation on the output 
magnets of each stage of majority gates. This enables us to recognize the total count of 
matches or mismatches between the input magnetizations to each majority gate, as we will 
discuss later. The total power consumption of each majority gate in this circuit is 3.75/iIF 
and the estimated area is less than 0.2 jim?. 


4.3 Single Pixel Comparator 

By having the required blocks, we propose the two different versions of the single pixel 
comparator. 

4.3.1 Standard implementation 

The schematic of this implementation and the table with the detailed operation are shown 
in Fig. 7. This circuit operates in the same order discussed in Section IIL The first stage 
of the circuit is a majority gate with inputs coming from the P users in the learning phase. 
The output of this majority gate settles to the corresponding mean pixel value. The output 
of this gate is connected to a comparator circuit which has the other input coming from 
the input image. The connection is through a short metallic interconnect to minimize the 
delay. When the learning phase is over and the detection phase starts, by applying the 
control voltage across the magnets of the comparator circuit, the “Pixel” magnet settles 
to the comparison value of the mean pixel and the input pixel. It is noteworthy that the 
input pixel can be applied on magnet Qij after the Pij magnetization settles to the mean 
pixel; hence, no extra memory circuit is required to store the value of Pij. 


4.3.2 Comparator-First implementation 

In this version, there will be the same number of comparator circuits as the total number 
of training images at the input side. The comparators have the input image pixel, Qij 
in common and differ in their other input which comes from their corresponding training 
image. The output magnets of the comparators are connected to the Pixel magnet through 
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Figure 7: (a) Standard single pixel detector schematic, (b) The truth table with the 
detailed operation of the circuit. 


metallic interconnects in a majority gate configuration. During the learning phase, the pat¬ 
tern pixels are stored in the corresponding input magnets. By applying the control voltage 
on the magnets of the circuit, the detection phase starts and the “Pixel” magnetization 
settles to the comparison value of the mean pixel and the input pixel. The schematic of 
the circuit and the detailed operation table are shown in Fig. 8. 

As it can be verified by comparing the last columns of Fig. 7(b) and Fig. 8(b), the 
“Pixel” steady state value is identical in the two versions. To verify the identical output 
result from the two different versions of the implementation in a more general case, we 
have to prove that the majority operation and the comparison (XOR/XNOR) operation 
are interchangeable, i.e.. 

Proposition 1 Given x, ^i, 7 / 2 , • • • ^yp as binary variables and P as an odd integer number, 

I P I P 

x®nint{—^yk) = nint{—^{x®yk)), (5) 

^ k=l ^ k=l 
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Figure 8: (a) Comparator-first pixel detector schematic, (b) The truth table with the 
detailed operation of the circuit. 


14 







































































where © denotes the XOR operation. The mathematical proof of this proposition is shown 
in Appendix A. 

Although the standard implementation has slightly lower power consumption (Less 
number of devices) and a smaller area, we select the comparator-first design as the unit 
cell of this circuit. This is due to the fact that the output magnetization transient of this 
circuit provides more information on the similarity of the training pixels and the input 
pixel. Based on Fig 7(b) and Fig 8(b), the final value of output magnetizations, in the two 
cases are identical. However, the Comparator-first output magnetizations is coming from a 
majority gate and switches when the majority of pattern pixels have the same value of the 
input pixel. If the majority gate at the output of Comparator-first circuit has a low fan-in ( 
e.g., < 5), the switching transient behavior will be less sensitive to the accumulated thermal 
noise and the information on the number of training pixels with identical values will be 
provided. On the other hand, in the standard implementation, the output magnetization 
is from the XNOR circuit and conveys no information on the number of similar pattern 
pixels. Based on Fig 7(b), the output magnetization transient will not add information on 
the number of training pixels with identical values. This is particularly important when 
the user in the detection phase tracks the total count of pattern pixels with the similar 
value. 

4.4 Non-Boolean Row Decision-Maker 

The last stage of the proposed circuit uses the interesting feature of the ASL majority gate 
as a means to quickly decide about the mainly similarity of the input image and the mean 
image, along the rows. The inputs to this majority gate is from the “Pixel” magnets of the 
pixels along the same row of the image. The connection is through short interconnects to 
minimize the delay. As mentioned before, the spin torque transferred from the input mag¬ 
net to the output magnet in the ASL majority gate, is determined by the magnetization 
of the input devices. As the number of devices with similar magnetization orientations 
increases, the transferred spin torque increases; hence, the output magnetization switching 
becomes faster according to (3). By proper selection of the control voltage timing and 
also the dimensions of the nanomagnets and metallic interconnects in this gate, a reliable 
decision-making based on the transient behavior of output magnetization is achieved. This 
final majority gate is sensitive to the uncorrelated thermal noise of input magnets; hence, 
an intentional low fan-in number (< 5) has to be selected. In our simulations, 3 magnets 
from the previous pixel stages are connected to this gate and as it will be shown in simu¬ 
lation results, a reliable decision-making is achieved. 

The complete circuit for the full image comparison consists of two stages. The unit 
pixel comparator and the row majority gate. The structure consisting of the comparator- 
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first circuits and the Row majority gate for a 3 x 3 pixel image comparison is called the 
“Smart Detector Cell”. This naming convention, helps the discussion of operation in the 
next section. We call these detector cells smart because they can perform multiple tasks of 
“storage”, “Boolean Computation” and “non-Boolean decision-making” in a time-efficient 
manner. The schematic of this circuit is shown in Fig. 9. The total power consumption of 
this circuit is 115 fiW and the occupied area is less than 0.5 jim?. 

We have to mention that in our simulations, we have taken into account the effect 
of magnetization switching. The ASL device acts like a resistive network and the power 
consumption will not change with time. In these devices, the current passes through only 
one magnet and therefore does not change with time and the magnet switching of the input 
side, will influence the switching delay of the last magnet. In this paper, we consider the 
worst case delay which takes into account the switching delay from the input magnet of the 
first device to the output magnet of the last device as well as the transport delay within 
the metallic interconnects. For DC power consumption estimations, we have performed 
DC and transient simulations and the results are consistent. It is noteworthy that the low 
operational voltage of the circuit will lead to a low power consumption. 

In a real implementation of this work, read/write circuits are added to fully realize the 
circuit. However, this paper focuses on the processing circuit without concerns regarding 
the feeding and extraction of the input and output data. In order to feed the input data, 
spin polarized currents are used to initialize the magnetization of input magnets based on 
the training images, similar to [^. On the other hand, the number of write units is equal 
to the number of pixels, while there is one output, which translates to small overhead. The 
decision data is in form of time delay and can be stored on a capacitor, where the delay 
impacts the amount of the stored charge. The other possibility to extract he output data 


will be using MTJ devices, as mentioned in 33 


5 Simulation Results 

In this section, we provide two different examples to show the reliable performance of smart 
detector cells. 


5.1 Non-BooIean Hamming Distance Identifier of 3x3 Pixel Pattern and 
Input Image 

In this example, we only have one training image and one input image. To compare 
the similarity of these two images, we need 9 XNOR gates to identify the similarity of 
corresponding pixels in the two images and 3 majority gates with Fan-in of 3 to decide on 
the similarity of the corresponding rows. It is also obvious that the mean image in this case 
will be the same pattern image. The smart detector cell in Fig. 9, has 3 comparator-first 
circuits and a Row majority gate. The mainly similarity of the rows can be determined by 
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Figure 9: Structure of the unit smart detector cell 


the Pixel majority gates. The last majority gate in this case, settles to +X magnetization if 
at least 2 rows are mainly similar. However, in this simulation, the output magnetization of 
this gate is not important. In this simulation, the initial magnetizations of the comparators 
and the majority gate outputs are set to 1. Fig. 10 shows the two images as well as the 
transient magnetization for various magnets. The Pixel waveforms overlap in some cases 
and that is why we are showing only 3 pixels in this figure. As expected, the comparator 
outputs switch for P 21 and P 22 pixels since the values in the input image and the pattern 
image are different. For the rest of pixels, the comparator output is -\-X magnetization 
and will not switch. Subsequently, row 1 and row 3 both exhibit perfect similarity and the 
output of the corresponding majority gates switch within the shortest time as it can be 
compared with Fig. 2(b). On the other hand, row 2 exhibits a mismatch and therefore 
can not switch to —X magnetization orientation. The control voltage of 5 mV is applied 
on all the magnets at t = 0 and the circuit compares the two images in less than 0.6 ns. 
Compared to CMOS circuits, this exhibits a much lower operational voltage and decision 
time. 
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Figure 10: Using a single smart detector cell, we can compare these 3x3 pixel images. 
The waveforms of the comparators and majority gates (bottom). 

5.2 Non-Boolean Similarity Comparison of a 9x9 Pixel Image and a Set 
of 3 Pattern Images 

In order to incorporate the smart detector cells for larger images, we need an accurate 
design of the cells. Here, we develop a circuit for training with 9x9 pixel images and 
perform a non-Boolean comparison between the constructed mean image and the 9x9 
pixel input image. In this simulation, 3 different users write the word “Spin” by their own 
choice of pixels. The 3 pattern images are shown in Fig. 11. 

In the detection phase, a new user of the circuit, chooses an arbitrary image of interest 
as the input. As an example in this simulation, the user chooses the word “swim” as shown 
in Fig. 12 (left). The circuit should compare this image and the mean image constructed 
from the training set. 

The mean image of the training set is also shown in Fig. 12 (right). One particular 
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Figure 11: Training set for the 9x9 pixel images 




Figure 12: The input image (left) and the mean image (right). 


advantage of constructing the mean image can be discussed here. As it can be seen in the 
mean image, those pixels which are mistakenly valued by a single user (e.g., P 26 ^md P 49 ) 
in the learning phase, are automatically corrected when the mean image is constructed. 
This is specifically useful, when the users during the learning phase, train the system with 
multiple versions of an image to make sure that the mean image represents their desired 
pattern. The mistaken values could be due to any source of error or distortion. In an 
ideal case where the thermal noise effect can be ignored, by simply changing the fan-in of 
different stages in the smart detector cell, the circuit can compare these two large images. 
However, in our simulations, as we model the thermal noise accurately, the fan-in consid¬ 
erations mentioned before, are particularly important. Based on these considerations, we 
break these 9x9 images into smaller 3x3 subimages, where a single smart detector cell 
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unit can be used for the comparison. The 9 smart detector cells can operate in paral¬ 
lel and the circuit configuration can be determined by the user. By this breakdown, we 
can also achieve more information on the pixels as we can check the mainly similarity 
for smaller blocks of the original image. The breakdowns of the mean image (squares on 
the right) and the input image (squares on the left) are shown in 3 x 3 partitions in Fig. 13. 



S.9 

Input Mean 


Input Mean 


C42 



Input Mean Input Mean 



Input Mean 



C72 




Input 


Mean 


Input Mean 


Input Mean 


Figure 13: Due to fan-in considerations, the circuit is consisted of 9 smart detector cells. 
The corresponding breakdowns of the mean image and the input image are shown here . 

In order to distinguish the different rows of smaller blocks, we use the notation of Cij 
clusters, which represents the elements of the row from column 3j — 2 to column 3j. 
The magnetization waveforms shown in Fig. 14 and Fig. 15 separately show the output 
magnetizations of smart detector cells for various clusters. The unified initial condition of 
the output magnet in this simulation is —X magnetization orientation. In Fig. 14, the 
switching delay of output magnetizations for the clusters with perfect match (Cn, C 22 
and C 41 ) and those with 1 mismatch (C 52 , C 42 and C 32 ) can be easily distinguished. This 
phenomenon was previously described as the unique feature of ASL majority gates and 
helps the users to identify the number of mismatches along different rows. At the same 
time, the output magnetization of the clusters with the same level of similarity, are very 
close in time domain which makes this non-Boolean decision-making a reliable metric. On 
the other hand, in Fig. 15, the output magnetization can not switch for clusters with mis¬ 
matches (C 43 and C 72 ) and as it can be seen, the level of precession for different mismatch 
levels is not the same. This is due to the different amount of spin torques provided in these 
two cases. If the user has a very high resolution study on the output magnetization, this 
can help to identify the number of mismatches; however, the switching transient is a more 
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Figure 14: The switching delay of output magnetization in last stage represents the simi¬ 
larity of input data and pattern data. 


reliable metric and the same information can be extracted by repeating the simulation with 
the output magnet initial condition set to +X magnetization. 



Figure 15: Since these clusters represent mismatch, they can not switch and the initial 
magnetization does not change. Note that the y-axis is showing from -1.002 to -0.998 in 
contrast with Fig. 14 in which the y axis is from -1 to 1. 
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e 1: Performance Comparison with existing CMOS systems 


Tab" 


Reference 

7 

12 

This Work 

Decision time 

30ns 

N.A. 

1 ns 

Image Size 

32 X 32 

86 neurons 

9x9 

DC Power 

N.A. 

2.2mW 

990 uW 

Area 

N.A. 

O.OlSmm^ 

< Ijim? 

Technology 

CMOS 

Spin-CMOS 

All-spin 


As it can be seen in all the simulation results, this circuit can make a decision in almost 
1000 ps for a 9 X 9 pixel image, whereas in CMOS, this decision time, can not be less than 
few nanoseconds. For a detailed comparison between the two technologies, in table 1, the 
performance of this circuit and few existing CMOS circuits are compared. 

6 Conclusion 

We have presented a novel non-Boolean image recognition circuit based on all-spin logic de¬ 
vices. The introduced circuit can perform all the phases of a non-Boolean pattern recogni¬ 
tion for binary images. Taking advantage of the non-volatility of ASL devices, the learning 
phase operation is performed incorporating no additional memory devices. By introducing 
the mainly similarity scheme, two different implementations of the circuit are proposed. As 
verified by simulation results, this circuit can recognize various sizes of binary image pat¬ 
terns faster than existing CMOS counterparts and consumes less power with an operational 
voltage of 5mV. Since the comparisons in this circuit are based on ASL majority gates, 
the computational complexity of the operation is also less compared to existing circuits. 
The proposed circuit has applications in fast and low power image recognition for security, 
medical imaging, and sensing. 


Appendix A: Proof of proposition 1 

In this appendix, we mathematically verify (7). Since all the variables are binary-valued, 

1 P 

o< 

k=i 


22 


















The nint operation results in 0, when 

( 6 ) 

k=l ^ 

otherwise it results in 1. Therefore, there are 4 different possibilities for the variables, as 
shown in table 11. In order to simplify the notations, we also define, 

Zk = x®yk yk e {!,■ ■ ■ ,P}- (7) 


Table 2: Possibilities of x, ^i, • • • , 


X 

P 

E Vk 

k=l 

P 

nint{p Vk) 

k=i 

p 

x®nint{p Vk) 

k=i 

0 

<f 

0 

0 

0 

>f 

1 

1 

1 

<f 

0 

1 

1 

>f 

1 

0 


Here, we verify (7) for the first row of table 11. Using the same method for the other 3 
rows, the proposition can be completely proved. 

p 

If ^ Vk < fewer than ^ of Vk^ are 1. Given x = 0, this means that fewer than 
k=i 

of are 1 and the rest are zero, i.e., 

P 

< w* 

k=i ^ 

Similar to (8), by applying the nearest integer function, 

1 P 

nint{— '^Zk) = 0. 

^ k=i 
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Appendix B: Simulation Parameters 


Size Effect Parameters 18 

Side Wall specularity 

p 

0 

Grain Boundary Reflectivity 

R 

0.2 

Interface Parameters(Co/Cu) 28 

Majority Spin conductance 

Gt 

0.375 1/n 

Minority Spin conductance 


0.125 1/fi 

Real Spin-Mixing Conductance 

ReG'i'j, 

3.43751/11 

Imaginary Spin-Mixing Conductance 

ImGti 

9.37x10“^ 1/fl 

Ferromagnet(Co) 31 

Ferromagnet Length 

Lx 

75.00 nm 

Ferromagnet Width 

Ly 

25.00 nm 

Ferromagnet Height 

L, 

3.00 nm 

Gilbert Damping Coefficient 

a 

0.0021 

Gyromagnetic Ratio 

7 

1.76x10^^ 1/sT 

Saturation Magnetization 

M, 

1.45x10® k/ia. 

Number of spins in magnet 

Ns 

1.34x10® 1/V 

Energy Density 

Ku 

0.5x 10® J/m2 

ChannelfCu) 32 

Channel Length 

Lint 

212.5 nm 

Channel Width 

Wint 

50 nm 

Thickness/Width aspect ratio 

AR 

2.0 

Channel Thickness 

Hint 

100.0 nm 

Cross section Area 

A 

5000 nm? 

Finite difference spacing 

Ax 

10.0 nm 

Conductivity 

a 

41.549 l/iiCLm^ 

Diffusion coefficient 

D 

0.014 m?/s 

Permeability 

fi 

0.003 m^/Vs 

Spin relaxation time 

Ts 

10.939 ps 
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